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Abstract. We construct a framework for studying clustering algorithms, which includes two key ideas: 
persistence and functoriality. The first encodes the idea that the output of a clustering scheme should carry 
a multiresolution structure, the second the idea that one should be able to compare the results of clustering 
algorithms as one varies the data set, for example by adding points or by applying functions to it. We show 
that within this framework, one can prove a theorem analogous to one of J. Kleinbcrg Klc02 , in which one 
obtains an existence and uniqueness theorem instead of a non-existence result. We explore further properties 
of this unique scheme, stability and convergence are established. 



1. Introduction 



Clustering techniques play a very central role in various parts of data analysis. They can give important 
clues to the structure of data sets, and therefore suggest results and hypotheses in the underlying science. 
There are many interesting methods of clustering available, which have been applied to good effect in dealing 
with many datasets of interest, and they are regarded as important methods in exploratory data analysis. 

Despite being one of the most commonly used tools for unsupervised exploratory data analisys and despite 
its and extensive literature very little is known about the theoretical foundations of clustering methods. 

The general question of which methods are "best" , or most appropriate for a particular problem, or 
how significant a particular clustering is has not been addressed as frequently. One problem is that many 
methods involve particular choices to be made at the outset, for example how many clusters there should 
be, or the value of a particular thresholding quantity. In addition, some methods depend on artifacts in the 
data, such as the particular order in which the elements are listed. In |Kle02j . J. Kleinberg proves a very 
interesting impossibility result for the problem of even defining a clustering scheme with some rather mild 
(N ■ invariance properties. He also points out that his results shed light on the trade-offs one has to make in 

choosing clustering algorithms. In this paper, we produce a variation on this theme, which we believe also 
has implications for how one thinks about and applies clustering algorithms. 

In addition, we study the precise quantitative (or metric) stability and convergence/consitency of one 
particular clustering scheme which is characterized by one of our results. 
We summarize the two main points in our approach. 



Persistence: We believe that the output of clustering algorithms shouldn't be a single set of clusters, 
but rather a more structured object which encodes "multiscale" or "multiresolution" information about the 
underlying dataset. The reason is that data can often intrinsically possess structure at various different 
scales, as in Figure [1] below. Clustering techniques should reflect this structure, and provide methods for 
representing and analyzing it. 

Ideally, users should be presented with a readily computable and presentable object which will give him/her 
the option of choosing the proper scale for the analysis, or perhaps interpreting the multiscale invariant 
directly, rather than being asked to choose a scale or choosing it for him/her. It is widely accepted that 
clustering is ultimately itself a tool for exploratory data analysis, vLBD05j. In some sense, it is therefore 
totally acceptable to provide this multiscale invariant, whenever available and let the user pick different 
scale thresholds that will yield different partitions of the data. Once we accept this, we can concentrate on 
answering theoretical questions regarding schemes that output this kind of information. Our analysis will 
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Figure 1. Dataset with multiscale structure and its corresponding dendrogram. 



not, however, rule out clustering methods that provide a one-scale view of the data, since, formally, one can 
consider a such a scheme as one that at all scales gives the same information, cf. Example 12.21 

We choose a particular way of representing this multiscale information, we use the formalism of persistent 
sets, which is introduced in Section^ Definition 12. 11 The idea of showing the multiscale clustering view of 
the dataset is widely used in Gene expression data analysis and it takes the form of dendrograms. 

Functoriality: As our replacement for the constraints discussed in |Kle02j . we will use instead the notion 
of functoriality which has been a very useful framework for the discussion of a variety of problems within 
mathematics over the last few decades. For a discussion of categories and functors, see [ML98] . Our idea is 
that clusters should be viewed as the stochastic analogue of the mathematical concept of path components. 
Recall (see, e.g. |Mun75j ) that the path components of a topological space X are the equivalence classes of 
points in the space under the equivalence relation ~ pa th, where, for x,y e X, we have x ~ pa th y if and only 
if there is a continuous map ip : [0, 1] —* X so that (p(0) = x and </?(l) = U- ln other words, two points in X 
are in the same path component if they are connected by a continuous path in X. This set of components 
is denoted by ttq(X). The assignment X — * Tto(X) is said to be functorial, in that given a continuous map 
/ : X — » Y (morphism of topological spaces), there is a natural map of sets 7To(/) : ttq(X) — » ttq(Y), which 
is defined by requiring that 7To(/) carries the path component of a point x e X to the path component of 
f{x) e Y. This notion has been critical in many aspects of geometry; it provides the basis for the methods 
of organizing geometric objects combinatorially which is referred to as combinatorial or simplicial topology. 

The input to clustering algorithms is not, of course, a topological space. Rather, it is typically point cloud 
data, finite sets of points lying in a Euclidean space of some dimension, or perhaps in some other metric 
space, such as a tree or a collection of words in some alphabet equipped with a metric. We will therefore 
think of it as a finite metric space (see }Mun75j for a discussion of metric spaces). There is a natural notion 
of what is meant by a map of metric spaces, which one can think of as loosely analogous to continuity. This 
notion has been used in other contexts in the past, see for example [Isb64 . Similarly, we define a natural 
notion of what is meant by a morphism of the persistent sets defined above, and require functoriality for 
the clustering algorithms we consider in terms of these notions of morphisms. For the time being the reader 
not familiar with the concept, can think of functoriality as a notion of coarse stability/consistency. By 
varying the richness of the class of morphisms between metric spaces one can control how stringent are the 
conditions imposed on the clustering algorithms. Functoriality can therefore be interpreted as a notion of 
coarse stability of these clustering algorithms. 

In McC02 , the idea of using categorical and functorial ideas in statistics has been proposed as a formalism 
for defining what is meant by statistical models. One aspect of our work is to show that the same ideas, which 
are so powerful in many other aspects of mathematics, can be used to understand the nature of algorithms 
for accomplishing statistical tasks. 

We summarize the main features of our point of view. 

(a) It makes explicit the notion of multiscale representation of the set of clusters. 

(b) By varying the degree of functoriality (i.e. by considering different notions of morphism on the domain 
of point cloud data) one can reason about the existence and properties of various schemes. We illustrate this 
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possibility in Section [3J In particular, are able to prove a uniqueness theorem for clustering algorithms with 
one natural notion of functoriality. 

(c) Beyond the conceptual advantages cited above, functoriality can be directly useful in analyzing 
datasets. The property can be used to study qualitative geometric properties of point cloud data, in- 
cluding more subtle geometric information than clustering, such as presence of "loopy" behavior or higher 
dimensional analogues. See e.g. |CIdSZ08] for an example of this point of view. We will also present an 
example in Subsection 13.21 In addition, the functoriality property can be used to analyze functions on the 
datasets, by studying the behavior of sublevel sets of the function under clustering. One version of this idea 
builds probabilistic versions of the Reeb graph. See |SMC07| for a number of examples of how this can work. 

Other, different, notions of stability of clustering schemes have appeared in the literature, see (Rag'82, 
BDvLP06 and references therein. We touch upon similar concepts in Section [5] 

The organization of the paper is as follows. In Section [2] we introduce the main objects that model the 
output of clustering algorithms together with some important examples. Section [3] introduces the concepts 
of categories and functors, and the idea of functoriality is discussed. We present our main characterization 
results in Section [H The quantitative study of stability and consistency is presented in Section [5] Further 
applications of the concept of functoriality are discussed in Section [5] and concluding remarks are presented 
in Section [7] 

2. Persistence 

In this section we define the objects which are the output of the clustering algorithms we will be work- 
ing with. These objects will encode the notion of "multiscale" or "multiresolution" sets discussed in the 
introduction. 

Let V(X) denote the set of partitions of the (finite) set X. 

Definition 2.1. A persistent set is a pair (X,9), where X is a finite set, and 9 is a function from the 
non-negative real line [0, +oo) to V{X) so that the following properties hold. 

(1) Ifr^s, then 8(r) refines 9(s). 

(2) For any r, there is a number e > so that 9(r') = 9(r) for all r' e [r, r + e]. 

// in addition there exists t > s.t. 9(t) consists of the single block partition for all r ^ t, then we say that 
(X, 9) is a dendrogram^ 

The intuition is that the set of blocks of the partition 9{r) should be regarded as X viewed at scale r. 

Example 2.1. Let (X, d) be a finite metric space. Then we can associate to (X, d) the persistent set 
whose underlying set is X, and where blocks of the partition 9(r) consist of the equivalence classes under 
the equivalence relation ~ r , where x ~ r x 1 if and only if there is a sequence xq, x%, . . . ,Xt e X so that 
xq = x,xt = x' ', and d(xi, Xi + \) ^ r for all i. 

Example 2.2. A more trivial example is one in which 9{r) is constant, i.e. consists of a single partition. 
This is the scale free notion of clustering. Examples are A:-means clustering and spectral clustering. 

Example 2.3. Here we consider the family of Agglomerative Hierarchical clustering techniques, JD88 . 
We (re)define these by the recursive procedure described next. Let X = {xi, . . . ,x n } and let C denote a 
family of linkage functions, i.e. functions which one uses for defining the distance between two clusters. Fix 
/ e C. For each R > consider the equivalence relation on blocks of a partition n e 'P(X), given 
by B B' if and only if there is a sequence of blocks B = B\, . . . ,B S = B' in II with l(Bk,Bk+i) ^ R 
for k = 1, . . . , s — 1. Consider the sequences r\, r 2 , . . . e [0, oo) and 9i, ©2, . . . e V(X) given by Oi : = 
{xi, . . . ,x n } and for i ^ 1, 6 i+ i = O;/ ~ />n where r 4 := mm{l(B, £>'), B,B' e Q t , B £>'}. Finally, we 
define 9 l : [0, 00) — > V{X) by r ^ 9 l (r) := Oi( r ) where i(r) := max{i|ri ^ r}. Standard choices for I are 
single linkage: l(B,B') = min^gg min^/gg/ d(x, x'); complete linkage l(B,B') = max^gg max^/gg/ d(x, x'); and 
average linkage: l(B,B r ) = ^ X£B ^g^g^' 3 ' ' ■ It is easily verified that the notion discussed in Example 12. II is 
equivalent to 9 l when I is the single linkage function. Note that, unlike the usual definition of agglomerative 



In the paper we will be using the word dendrogram to refer both to the object defined here and to the standard graphical 
representation of them. 
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hierarchical clustering, at each step of the inductive definition we allow for more than two clusters to be 
merged. 

We will be using the persistent sets which arise out of Example 12.11 It is of course the case that the 
persistent set carries much more information than a single set of clusters. One can ask whether it carries too 
much information, in the sense that either (a) one cannot obtain useful interpretations from it or (b) it is 
computationally intractable. We claim that it can usually be usefully interpeted, and can be effectively and 
efficiently computed. One can observe this as follows. Since there are only a finite number of partitions of 
X, a persistent set Q gives a partition of M + into a finite collection T of intervals of the form [r, r'), together 
with one interval of the form [r, +00). For each such interval, every number in the interval corresponds to 
the same partition of X. 

We claim that knowledge of these intervals is a key piece of information about the persistent sets arising 
from Examples 12.11 and 12.31 above. The reason is that long intervals in / correspond to large ranges of 
values of the scale parameter in which the associated cluster decomposition doesn't change. One would then 
regard the partition into clusters corresponding to that interval as likely to represent significant structure 
present at the given range of scales. If there is only one long interval (aside from the infinite interval of 
the form [r, +00)) in I, then one is led to believe that there is only one interesting range of scales, with a 
unique decomposition into clusters. However, if there are more that one long interval, then it suggests that 
the object has significant multiscale behavior, see Figure [TJ Of course, the determination of what is "long" 
and what is "short" will be problem dependent, but choosing thresholds for the length of the intervals will 
give definite ranges of scales. As for the computability, the persistent sets associated to a finite metric space 
can be readily computed using (conveniently modified) hierarchical clustering techniques, or the methods of 
persistent homology (see |ZC04j ). 

3. Categories, functors and functoriality 

3.1. Definitions and Examples. In this section, we will give a brief description of the theory of categories 
and functors, which will be the framework in which we state the constraints we will require of our clustering 
algorithms. An excellent reference for these ideas is (ML98 . 

Categories are useful mathematical constructs that encode the nature of certain objects of interest together 
with a set of admissible/interesting/useful maps between them. This formalism is extremely useful for 
studying classes of mathematical objects which share a common structure, such as sets, groups, vector 
spaces, or topological spaces. The definition is as follows. 

Definition 3.1. A category C_ consists of 

• A collection of objects ob(C_) (e.g. sets, groups, vector spaces, etc.) 

• For each pair of objects 1,7 e 06(C) , a set 

Morc{X,Y), the morphisms from X to Y (e.g. maps of sets from X to Y, homomorphisms of 
groups from X to Y , linear transformations from X to Y , etc. respectively) 

• Composition operations: 

a : Morc(X,Y) x Morc(Y, Z) — > Morc(X, Z), corresponding to composition of set maps, group 
homomorphisms, linear transformations, etc. 

• For each object X e C_, a distinguished element idx e Morc(X, X) 

The composition is assumed to be associative in the obvious sense, and for any f e Morc(X,Y), it is 
assumed that idy ° f = f and f o idx = f ■ 

Here are the relevant examples for this paper. 

Example 3.1. We will construct three categories M lso , M mon , and M 9en , whose collections of objects will 
all consist of the collection of finite metric spaces. Let (X, dx) and (Y, dy) denote finite metric spaces. A set 
map / : X — > Y is said to be distance non increasing if for all x, x' e X, we have dy(f(x), f(x')) ^ dx(x, x'). 
It is easy to check that composition of distance non-increasing maps are also distance non-increasing, and 
it is also clear that idx is always distance non-increasing. We therefore have the category Ai 9en , whose 
objects are finite metric spaces, and so that for any objects X and Y, MorM^ en (A, Y) is the set of distance 
non-increasing maps from X to Y, cf. [Isb64] for another use of this class of maps. We say that a distance 
non-increasing map is monic if it is an inclusion as a set map. It is clear compositions of monic maps are 
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monic, and that all identity maps are monic, so we have the new category .M m °", in which MorM mon (X, Y) 
consists of the monic distance non-increasing maps. Finally, if (X, dx) and (Y, dy) are finite metric spaces, 
/ : X —> Y is an isometry if / is bijective and dy(f(x), f(x')) = dx(x, x') for all x and x' . It is clear that as 
above, one can form a category A4 lso whose objects are finite metric spaces and whose morphisms are the 
isometries. It is clear that we have inclusions 



(3-1) 



M lso c M 1 



c M 6 



of subcategories (defined as in [ML98] ). Note that although the inclusions are bijections on object sets, they 
are proper inclusions on morphism sets, i.e. they are not in general surjective. 

We will also construct a category of persistent sets. 

Example 3.2. Let (X,9),(Y,rj) be persistent sets. For any partition II of a set Y, and any set map 
/ : X — » Y, we define /*(II) to be the partition of X whose blocks are the sets f~ 1 (B), as B ranges over 
the blocks of II. A map of sets / : X — > Y is said to be persistence preserving if for each r e R, we have 
that 9(r) is a refinement of f*(i](r)). It is easily verified that the composite of persistence preserving maps 
is persistence preserving, and that any identity map is persistence preserving, and it is therefore clear that 
we may define a category V whose objects are persistent sets, and where Morp_((X, 9), (Y, n)) consists of the 
set maps from X to Y which are persistence preserving. A simple example is shown in Figure [2] 
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Figure 2. Two persistent sets (X, 6) and (Y,n) represented by their dendrograms. On 
the left one defined in the set X = {A, B, C} and on the right one defined on the set 
Y = {A', B', C'}. Consider the given set map / : X —>Y. Then we see that / is persistence 
preserving since for each r ^ 0, the partition 9(r) is a refinement of f*(r](r)). Indeed, there 
are three interesting ranges of values of r. Pick for example r like in the orange shaded area: 
r e [1, 2). Then n(r) = {{A', B'}, {C}} and hence /*(ry(r)) = {/^{{A', B'}), {/^(C)}} = 
{{A, B}, {C}} which is indeed refined by 9(r) = {{A}, {B} 7 {C}}. One proceeds similary for 
the other two cases. 



We next introduce the key concept in our discussion, that of a functor. We give the formal definition first. 

Definition 3.2. Let C_ and D_ be categories. Then a functor from C_ to D_ consists of 

• A map of sets F : ob(C_) — > ob(D) 

• For every pair of objects X,Y e C a map of sets $(A, Y) : Morc_{X, Y) — » Mor^(FX, FY) so that 

(1) <I>(X,X)(idx) = id F{x) for all X e ob(C) 

(2) Z)(gof) = $(y, Z){g) ° Y)(f) for all f 6 Morc(X, Y) and g 6 MorciY, Z) 

Remark 3.1. In the interest of clarity, we often refer to the pair (F,<S>) as a single letter F. See diagram 
li3-2\) in Example \3.5\ below for an example. 

A morphism / : X — > Y which has a two sided inverse g : Y — » X, so that f ° g = idy and g° f = idx, 
is called an isomorphism. Two objects which are isomorphic are intuitively thought of as "structurally 
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indistinguishable" in the sense that they are identical except for naming or choice of coordinates. For 
example, in the category of sets, the sets {1,2,3} and {A,B,C} are isomorphic, since they are identical 
except for choice made in labelling the elements. We illustrate this definition with some examples. 

Example 3.3. (Forgetful functors) When one has two categories (7 and D_, where the objects in C_ are 
objects in D_ equipped with some additional structure and the morphisms in C_ are simply the morphisms 
in D_ which preserve that structure, then we obtain the "forgetful functor" from C_ to D_, which carries the 
object in C to the same object in (7, but regarded without the additional structure. For example, a group 
can be regarded as a set with the additional structure of multiplication and inverse maps, and the group 
homomorphisms are simply the set maps which respect that structure. Accordingly, we have the functor 
from the category of groups to the category of sets which "forgets the multiplication and inverse" . Similarly, 
we have the forgetful functor from V_ to the category of sets, which forgets the presence of 9 in the persistent 
set (X,9). 

Example 3.4. The inclusions M iso c M mon Q M gen are both functors. 

Any given clustering scheme is a procedure F which takes as input a finite metric space (A, dx ) , that 
is, an object in ob(M gen ), and delivers as output a persistent set, that is, an object in ob(V). The concept 
of functoriality refers to the additional condition that the clustering procedure maps a pair of input objects 
into a pair of output objects in a manner which is consistent/stable with respect to the morphisms attached 
to the input and output spaces. When this happens, we say that the clustering scheme is functorial . This 
notion of consistency/stability is made precise in Definition 13.21 and described by diagram (|3-2p . 

Now, the idea is to regard clustering algorithms (that output a persistent set) as functors. Assume for 
instance we want to consider "stability" to all distance non-increasing maps. Then the correct category of 
inputs (finite metric spaces) is Ai gen and the category of outputs is V_. According to Definition 13.21 in order 
to view a clustering scheme as a functor we need to specify (1) how it maps objects of M. gen (finite metric 
spaces) into objects of V_ (persistent sets), and (2) how a valid morphism/map / : (X, dx) — * (Y, dy) between 
two objects (X, dx) and (Y, dy) in the input space /category M. gen induce a map in the output category V_, 
see diagram (|3-2p below. 

We exemplify this through the construction of the key example for this paper. 

Example 3.5. We define a functor 

TZ gen : M gen -> V 

as follows. For a finite metric space (X, dx), we define 7Z gen (X, dx) to be the persistent set (X, 9 dx ), where 
gd x (j,^ j g f- ne partition associated to the equivalence relation ~ r defined in Example 12.11 This is clearly an 
object in V_. We also define how TZ 9en acts on maps / : (X, dx) — > (Y, dy): The value of lZ 9en (f) is simply 
the set map / regarded as a morphism from (X, 9 dx ) to (Y, 9 dY ) in V_. That it is a morphism in V_ is easy 
to check. This functorial construction is represented through the diagram below: 

(3-2) { x,dx)^(X,9 d *) 

K 9C "(/) 

(Y,dy)^l(Y9 d -) 
where lZ gen (f) is persistence preserving. 

Example 3.6. By restricting TZ gen to the subcategories M ls ° and M mon , we obtain functors 

n iso . j^iso p and n mon . j^rnon _^ p 

Example 3.7. Let A be any positive real number. Then we define a functor o\ : M gen — * j^ mon on objects 

by 

a x (X,dx) = (X, Xdx) 

and on morphisms by <J\(f) = f. One easily verifies that if / satisfies the conditions for being a morphism 
in M gen from (X, dx) to (Y, dy), then it readily satisfies the conditions to be a morphism from (X, Xdx) to 
(Y, Xdy). Similarly, we define a functor s\:V^>Vhy setting sx(X, 9) = (A, 6» A ), where 6> A (r) = 9(j). 

In Section [4] we will be showing our main results. We will now have a brief disgression to discuss other 
situations in which, in our opinion, the concept of functoriality can be useful. 
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3.2. Intrinsic Value of Functor iality. By studying functorial methods of clustering, it is possible to 
recover qualitative aspects of the geometric structure of a dataset. We illustrate this idea with a "toy" 
example. We suppose that we have a point cloud data which is concentrated around the unit circle. We 
consider the projection of the data on to the x-axis, and cover the axis with two (overlapping) intervals 
U and V, pictured on the left in Figure [3] below as being red and yellow, with orange intersection. By 
considering those portions of the dataset whose ^-coordinate lie in U and V respectively, we obtain the red 
and yellow subsets of the dataset pictured on the right below. Their intersection is pictured as orange, and 
the arrows indicate that we have inclusions of the intersection into each of the pieces. Next, we note that if 

p c : o — <> 

Figure 3. Left: Covering of a circle by two intervals. Center: Corresponding diagram of 
components. Right: Homotopy colimit of diagram in center figure. 

we are dealing with a functorial clustering scheme, and cluster each of these subsets, we obtain the diagram 
of clusters in the center of Figure [3l This is now a very simple combinatorial object. 

There is a topological construction known as the homotopy colimit, which given any diagram of sets of any 
shape reconstructs a simplicial set (a slightly more flexible version of the notion of simplicial complex), 
and in particular a space. To first approximation, one builds a vertex for every element in any set in the 
diagram, and an edge between any two elements which are connected by a map in the diagram, and then 
attaches higher order simplices according to a well defined procedure. In the case of the diagram above, this 
constructs the space given in the rightmost part of Figure [3] . 

The details of the theory of simplicial sets and homotopy colimits are beyond the scope of this paper. A 
thorough exposition is given in BK72]. 

Functoriality is also quite useful when one is interested in studying the qualitative behavior of a real- valued 
function / on a dataset, for example the output of a density estimator. Then it is useful to study the set 
of clusters in sublevel and superlevel sets of /, and understanding how the clusters behave under changes in 
the thresholds can help one understand the presence of saddle points and higher index critical points of the 
function. 

One example of this is two-parameter persistence constructions, [CSZonj . In this case, there is more 
structure than just persistent sets (trees/dendrograms) as defined in this paper. 
We will elaborate on another application of functoriality in Section [6] 

4. Results 

We now study different clustering algorithms using the idea of functoriality. We have 3 possible "input" 
categories ordered by inclusion (|3-ip . The idea is that studying functoriality over a larger category will 
be more stringent/demanding than requiring functoriality over a smaller one. We now consider different 
clustering algorithms and study whether they are functorial over our choice of the input category. We 
start by analyzing functoriality over the least demanding one, M lso , then we prove a uniqueness result for 
functoriality over J\A aen and finally we study how relaxing the conditions imposed by the morphisms in 
Ai gen , namely, by restricting ourselves to the smaller but intermediate category A4 mon , we permit more 
functorial clustering algorithms. 

4.1. Functorality over M. lso . This is the smallest category we will deal with. The morphisms in M"° 
are simply the bijective maps between datasets which preserve the distance function. As such, functoriality 
of a clustering algorithms over JVC so simply means that the output of the scheme doesn't depend on any 
artifacts in the dataset, such as the way the points are named or the way in which they are ordered. Here 
are some examples which illustrate the idea. 

• The k-means algorithm (see |JD88j ) is in principle allowed by our framework since ob{V) contains 
all constant persistent sets. However it is not functorial on any of our input categories. It depends 
both on a paramter k (number of clusters) and on an initial choices of means, and is not therefore 
dependent on the metric structure alone. 
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• Agglomerative hierarchical clustering, in standard form, as described for example in |JD88| . begins 
with point cloud data and constructs a binary tree (or dendrogram) which describes the merging 
of clusters as a threshold is increased. The lack of functoriality comes from the fact that when a 
single threshold value corresponds to more than one data point, one is forced to choose an ordering 
in order to decide which points to "agglomerate" first. This can easily be modified by relaxing the 
requirement that the tree be binary. This is what we did in Example 12.31 In this case, one can view 
these methods as functorial on M lso , where the functor takes its values in arbitrary rooted trees. It 
is understood that in this case, the notion of morphism for the output (P) is simply isomorphism of 
rooted trees. In contrast, we see next that amongst these methods, when we impose that they be 
functorial over the larger (more demanding) category M. 9en then only one of them passes the testQ 

• Spectral clustering. As described in [vL07j, typically, spectral methods consist of two different layers. 
They first define a laplacian matrix out of the dissimilarity matrix (given by dx in our case) and 
then find eigenvalues and eigenvectors of this operator. The second layer is as follows: a natural 
number k must be specified, a projection to R fc is performed using the eigenfunctions, and clusters 
are found by an application of the fc-means clustering algorithm. Clearly, operations in the second 
layer will fail to be functorial as they do not depend on the metric alone. However, the procedure 
underlying in the first layer is clearly functorial on A4 lso as eigenvalue computations are changed by 
a permutation in a well defined, natural, way. 

4.2. Functorality over M_ 9en : a uniqueness theorem. In this section, as an example application of the 
conceptual framework of functoriality, we will prove a theorem of the same flavor as the main theorem of 
[Kle02] , except that we prove existence and uniqueness on J\A gen instead of impossibility in our context. 

Before stating and proving our theorem, it is interesting to point out why complete linkage and average 
linkage (agglomerative) clustering, as defined in Example l2.3l are not functorial on M_ gen . A simple example 
explains this: consider the metric spaces X = {A, B, C} with metric given by the edge lengths {4, 3, 5} and 
Y = (A 1 , B', C) with metric given by the edge lengths {4, 3, 2}, as given in Figure |H Obviously the map / 
from X to Y with f(A) = A', f(B) = B' and /(C) = C" is a morphism in M 9en . Note that for example for 
r = 3.5 (shaded regions of the dendrograms in Figure[4| we have that the partition of X is IIx = {{A C}, B} 
whereas the partition of Y is Hy = {{A 1 , B'}, C'} and thus f*(U Y ) = {{A, B), {C}}. Therefore Tl x does 
not refine /*(IFy) as required by functoriality. The same construction yields a counter-example for average 
linkage. 

Theorem 4.1. Let 'J : A4 ge " —> V be a functor which satisfies the following conditions. 

(I) : Let a : M gen -> Sets and (3 : V -> Sets be the forgetful functor s(X,d x ) -> X and (X, 9) -> X, 
which forget the metric and partition respectively, and only "remember" the underlying sets X . Then 
we assume that (3° \& = a. This means that the underlying set of the persistent set associated to a 
metric space is just the underlying set of the metric space. 

(II) : For 8 ^ let Z(S) = ({p,q}, [])) denote the two point metric space with underlying set {p,q}, 
and where dist(p,q) = S. Then ty(Z(5)) is the persistent set ({p,q},9 Z ^) whose underlying set is 
{p, q] and where Z ^ (i) is the partition with one element blocks when t < S and it is the partition 
with a single two point block when t ^ 5. 

(III) : Given a finite metric space (X,dx), let 

sep(X) := m\ndx{x,x'). 

Write ^(X, dx) = (X, 0*), then for any t < sep(X), the partition 0*(t) is the discrete partition with 
one element blocks. 

Then ^ is equal to the functor ~R9 en . 

Proof. Let ^(X, dx) = (X, #*). For each r ^ Owe will prove that (a) 6 dx (r) is a refinement of #*(r) and 
(b) 6**(r) is a refinement of 9 dx {r). 

2 The result in Theorem 14.11 is actually more powerful in that it states that there is a unique functor from Jvj gen to "P that 
satisfies certain natural conditions. 
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Figure 4. An example that shows why complete linkage fails to be functorial on Ai gen . 



Then it will follow that 9 dx (r) = 0*(r) for all r ^ 0, which shows that the objects are the same. Since 
this is a situation where, given any pair of objects, there is at most one morphism between them, this also 
determines the effect of the functor on morphisms. 

Fix r ^ 0. In order to obtain (a) we need to prove that whenever x, x' e X lie in the same block of the 
partition 9 dx (r), that is x ~ r x', then they both lie in the same block of 9 (r). 

It is enough to prove the following Claim : whenever 
dx(x, x 1 ) ^ r then x and x' lie in the same block of 0*(r). 

Indeed, if the claim is true, and x ~ r %' then one can find xq, x±, . . . ,x n with xq = x, x n = x' and 
dx(xi, Xi + ±) r for i = 0, 1, 2, . . . , n — 1. Then, invoking the claim for all pairs (xi, Xi + ±), i = 0, . . . , n — 1 
one would find that: x = xo and x\ lie in the same block of xi and a;2 lie in the same block of 0*(r), 

. . ., x n -i and a;„ = lie in the same block of #*(r). Hence, x and x' lie in the same block of 6**(r). 

So, let's prove the claim. Assume dx(x, x') r, then the function given by p — » x, g — » x' is a morphism 
<7 : ^(r) — > (X,dx) in : M 9en . This means that we obtain a morphism 

*(ff):*(Z(r))-»*(Jf,d^) 

in P. But, p and g lie in the same block of the partition 8 z ( r ) by definition of Z(r), and functoriality therefore 
guarantees that ^(g) is persistence preserving (recall Example 13. 2p and hence the elements g(p) = x and 
g(q) = x' lie in the same block of 9 dx (r). This concludes the proof of (a). 

For condition (b), assume that x and x' belong to the same block of the partition 9 (r). We will prove 
that necessarily x ~ r x'. This of course will imply that x and x' belong to the same block of 9 dx (r). 

Consider the metric space (^[r], d[ r ]) whose points are the equivalence classes of X under the equivalence 
relation ~ r , and where the metric d[ r ] : X[r] x X[r] — » R + is defined to be the maximal metric pointwisely 
less than or equal to W, where for two points B and B' in X[r] (equivalence classes of X under ~ r ), 
W(B,B') = min^gg min^/gg/ dx{x, x')^ It follows from the definition of ~ r that if two equivalence classes 
are distinct, then the distance between them is > r. This means that sep(A[r]) > r. 



'See Section [5] for a similar, explicit construction. 
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Write *(X[r], = (J[r], Since sep(X[r]) > r, hypothesis (III) now directly shows that the 

blocks of the partition 9^1 (r) are exactly the equivalence classes of X under the equivalence relation ~ r , 
that is 0M(r) = 9 dx (r). Finally, consider the morphism 

7T r : (X,dx) - (X[r],d [r] ) 

in Jv[ 9en given on elements lelby it r (x) = [x] r , where [x] r denotes the equivalence class of x under ~ r . 
By functoriality, ^(tiv) : (X, 0*) — * (X\f\, 0M) is persistence preserving, and therefore, #*(r) is a refinement 
of 0M(r) = 9 dx (r). This is depicted as follows: 



(X,d x ) ^(X,9 d * 



(X[r 



d [r] )-^(X[ 



This concludes the proof of (b). □ 

We should point out that another characterization of single linkage has been obtained in the book [JS71J. 

4.2.1. Comments on Kleinberg's conditions. We conclude this section by observing that analogues of the 
three (axiomatic) properties considered by Kleinberg in [Kle 02 hold for !Z gen . 

Kleinberg's first condition was scale-invariance, which asserted that if the distances in the underlying point 
cloud data were multiplied by a constant positive multiple A, then the resulting clustering decomposition 
should be identical. In our case, this is replaced by the condition that TZ gen a <t\(X, dx) = s\° lZ gen (X, dx), 
which is trivially satisfied. 

Kleinberg's second condition , richness, asserts that any partition of a dataset can be obtained as the 
result of the given clustering scheme for some metric on the dataset. In our context, partitions are replaced 
by persistent sets. Assume that there exist t eM. s.t. 0(t) is the single block partition, i.e., impose that the 
persistent set is a dendrogram (cf. Definition ^. ip . In this case, it is easy to check that any such persistent set 
can be obtained as TZ gen evaluated for some (pseudo)metric on some dataset. IndeedQ let (X, 9) e ob(V). Let 
ei, . . . , £fe be the (finitely many) transition/discontinuity points of 9. For x, x' e X define dx(x, x') = minjei} 
s.t. x,x' belong to same block of 9{e{). 

This is a pseudo metric on X. Indeed, pick points x,x' and x" in X. Let e\ and £2 be minimal s.t. x,x' 
belong to the same block of $(ei) and x', x" belong to the same block of 9 (€2)- Let £12 := max(ei, £2). Since 
(X, 9) is a persistent set ^Definition 12. 1| . #(£12) must have a block B s.t. x,x' and x" all lie in B. Hence 
dx(x, x") ^ £12 ^ £1 + £2 = dx(x, x 1 ) + dx(x', x"). 

Finally, Kleinberg's third condition , consistency, could be viewed as a rudimentary example of functori- 
ality. His morphisms are similar to the ones in Ai 9en . 

4.3. Functoriality over M. mon . In this section, we illustrate how relaxing the functoriality permits more 
clustering algorithms. In other words, we will restrict ourselves to A4 mo " which is smaller (less stringent) 
than M_ 9en but larger (more stringent) than M™ . We consider the restriction of TZ? en to the category 
j\y[ mon , For any metric space and every value of the persistence parameter r, we will obtain a partition of 
the underlying set X of the metric space in question, and the set of equivalence classes under ~ r . For any 
x e X, let [x] r be the equivalence class of x under the equivalence relation ~ r , and define c(x) = #[x] r . 
For any integer m, we now define X m c X by X rn = {x e X|c(a;) ^ to}. We note that for any morphism 
/ : X — > Y in M mon , we find that f(X m ) c Y m . This property clearly does not hold for more general 
morphisms. For every r, we can now define a new equivalence relation ~™ on X, which refines ~ r , by 
requiring that each equivalence class of ~ r which has cardinality ^ m is an equivalence class of ~™, and 
that for any x for which c(x) < m, x defines a singleton equivalence class in We now obtain a new 

persistent set (X, 9 m ), where 9 m (r) will denote the partition associated to the equivalence relation ~"\ It is 
readily checked that X — » (X, 9 m ) is functorial on A^ mo ". This scheme could be motivated by the intuitition 
that one does not regard clusters of small cardinality as significant, and therefore makes points lying in small 
clusters into singletons, where one can then remove them as representing "outliers". 



! We only prove triangle inequality. 
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5. Metric stability and convergence properties of 1Z 9 ' 



In this section we briefly discuss some further properties of TZ gen (single linkage dendrograms). We will 
provide quantitative results on the stability and convergence/ 'consitency properties of this functor (algo- 
rithm). To the best of our knowledge, the only other related results obtained for this algorithm appear in 
|Har81j . The issues of stability and convergence/consistency of clustering algorithms have brought back into 
attention recently, see [vLBD05 , IBDvLP06j and references therein. In Theorem 1 5 . 1 1 besides proving stability, 
we prove convergence in a simple setting. 

Given finite metric spaces (X,dx) and (Y,dy), our goal is to define a distance between the persistence 
objects 9 dx and 9 dv respectively produced by lZ gen . We know that this functor actually outputs dendrograms 
(rooted trees), which have a natural metric structure attached to them. Moroever, it is well known that 
rooted trees are uniquely characterized by their distance matrix, [SS03] . 

For a finite metric space (X, dx) consider the derived metric space (X, ex) with the same underlying set 
and metric 

(5-3) £x(x, x') : = min{e ^ 0| x ~ e x'}. 

Note that (X, 9 dx ) can therefore obviously be regarded as the metric space (X, ex ) , cf . with the con- 
struction of the metric in section [4.2.11 We now check that indeed ex defines a metric on X. 

Proposition 5.1. For any finite metric space (X,d x ), (X,ex) is also a metric space. 

Proof, (a) Since dx is a metric on X, it is obvious that e x (x,x') = implies x = x'. (b) Symmetry is also 
obvious since ~ £ is an equivalence relation, (c) Triangle inequality: Pick x,x',x" e X. Let e x (x,x') = s\ 
and ex(x' , x") = e^. Then, there exist points do, ai, ■ ■ ■ , dj and bo, b\, • • • , bk in X with ao = x, aj = x' = bo, 
bk = x" and d x (a,i, »i+i) £i for i = 0, . . . ,j — 1 and dx(h, bj+i) S~ £2 for i = 0, . . . ,k — 1. Consider the 
points {ci}Co +1 = { a o> ■ ■ • >Uj,h, ■ ■ - ,bk}- 

Then d x (ci, c%+i) ^ max(ei,£2) ^ £1 +£2- Hence x ~ El2 x" with £12 = £1 +£2 and then by definition (|5 3(1 . 
ex(x,x") ^ ex(x,x') + ex(x',x"). □ 

In order to compare the outputs of 1Z.9 en on two different finite metric spaces (X, dx) and (X',dx<) we 
will instead compare the metric space representations of those outputs, (X,ex) and (X',ex'), respectively. 
For this purpose, we choose to work with the Gromov-Hausdorff distance which we define now, [BBI01 . 

Definition 5.1 (Correspondence). For sets A and B, a subset R a A x B is a correspondence (between A 
and B) if and and only if 

• V a e A, there exists b 6 B s.t. (a, b) e R 

• V b e B , there exists a 6 X s.t. (a, b) 6 R 

Let 1Z(A, B) denote the set of all possible correspondences between sets A and B. 

Consider finite metric spaces (X, dx) and (Y, dy). Let Tx.y :XxYxXxY—> M. + be given by 

(x,y,x',y') 1 * \d x (x,x') - d Y (y,y')\. 

Then, the Gromov-Hausdorff distance between X and Y is given by 

(5-4) d gn (X,Y):= inf max r x ,Y (x, y, x', y') 

ReTl(X,Y) (x,y),(x',y')eR 

Remark 5.1. This expression defines a metric on the set of (isometry classes of) finite metric spaces, 
[BBIOlj (Theorem 7.3.30). 

One has: 

Proposition 5.2. For any finite metric spaces (X,dx) and (Y^dy) 

d gn ((X, d x ), (Y, dy)) ^ d gH ((X, e x ), (Y, ey)). 

Proof. Let r\ = d gH ((X,d x ), (Y,d Y )) and R e 1Z(X, Y) s.t. \d x (x, x') - d Y (y, y')\ < r) for all (x,y), (x',y') e 
R. Fix (x, y) and (x',y') e R. Let XQ,...,x m 6 I be s.t. xq = x, x m = x' and dx{xi,Xi + \) ^ e(x,x') for 
all i = 0, . . . , m — 1. Let y = yo, y%, . . . , y m -i, Vm = y' 6 Y be s.t. (xi, yi) e R for alH = 0, . . . , m (this is 
possible by definition of R). Then, d Y (yi, yt+i) ^ dx{x%, £Ej+i) + f] < £ X (x, x') + rj for all i = 0, . . . , m — 1 
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and hence £y(v, u') ^ £ X (x,x') + r\. By exchanging the roles of X and Y one obtains the inequality 
e x (x, x') ^ £y (y, y') + r\. This means \e x (x, x') — £y(y, y')\ s i V- Since (x, y), (x 1 , y') e R are arbitrary, and 
upon recalling the definition of the Gromov-Hausdorff distance we obtain the desired conclusion. □ 

Proposition 15 . 21 will allow us to quantify stability and convergence. We provide deterministic arguments. 
The same construction, essentially, yields similar results under the assumption that (Z,dz) is enriched with 
a (Borel) probability measure and one takes i.i.d. samples w.r.t. this probability measure. Assume (Z, dz) is 
an underlying (perhaps "continuous") metric space from which different finite samples are drawn. We would 
like to see, quantitatively, (1) how the results yielded by Tl 9en differ when applied to those different sample 
sets (which possibly contain different numbers of points), this is stability and (2) that when the underlying 
metric space is partitioned, there is convergence and consistency in a precise sense. 

Assume A is a finite set and let W : A x A — > R + be a symmetric map. Using the usual path-length 
construction, we endow A with the (pseudo) metric 

m — 1 

eU(a, a') := min ^ W(a.k, afe+i) 

k=0 

where the minimum is taken over m and all sets of m + 1 points ao, . . . , a m such that ao = a and a m = a! . 
We denote dA = C(W). This is a standard construction, see [BH99] §1.24. 

For a compact metric space (Z,dz) and any two of its compact subsets Z%, Z 2 let 

D Z (Z 1 ,Z 2 ) = min min d z (zi,z 2 ). 
zieZi z 2 £Z 2 

For (Z, dz) compact and I,I'cZ compact, let d^(X, X') denote the Hausdorff distance (in Z) between 
X and X', [BBIOlj . For any X cz Z let R(X) := d^(X,Z). Intuitively this number measures how well X 
approximates Z. One says that X is an R(X)-covering of Z or an i?(A)-net of Z. 

The following theorem summarizes our main results regarding metric stability and convergence/consistency. 
The situation described by the theorem is depicted in Figure [5l 

Theorem 5.1. Assume (Z,dz) is a compact metric space. Let X and X' be any two finite sets of points 
sampled from Z. Endow these two sets with the (restricted) metric dz- Then, 

(1) (Finite Stability) d gH ((X, e x ), (X',e x >)) 2(R(X) + R(X')). 

(2) (Asymptotic Stability) As max(R(X),R(X')) -> one has dgn((X,e x ), {X',e x >)) -> 0. 

(3) (Convergence/consistency) Assume in addition that Z = (j a€ AZ a where A is a finite index set 
and Z a are compact, disjoint and path- connected sets. Let {A, dA) be the finite metric space with 
underlying set A and metric given by dA '■= C(W) where W(a, a') := Dz(Z a , Z a i) for a, a 1 6 aE 
Then, as R(X) — > one has 

d gn ((X,s x ),(A,e A )) -> 0. 

Proof. Let 6 > be s.t. min a ^ Dz(Z a , Zp) ^ 5. 

Claim 1. follows from Proposition 15.21 let d x (resp. d X i) equal the restriction of dz to X x X (resp. 
X' x X'). Then, by the triangle inequality for the Gromov-Hausdorff distance 

d gH (X,Z) + d gH (X',Z)Zd gH ((X,e x )),(X',e xl ))). 

Now, the claim follows from the fact that whenever Z a Z', d gn (Z',Z) sS 2d^(Z,Z') = 2R(Z'), [BBIOlj . 
§7.3. 

Claim 2. follows directly from claim 1. 

We now prove the third claim. For each x e X let a(x) denote the index of the path connected component 
of Z s.t. x e Z a i x y Assume, R(X) < |. Then, it is clear that # (Z a n X) 1 for all a e A. Then it follows 
that R = {(x, a(x))\x e X] belongs to 1Z(X, A). We prove below that for all x, x' e X 

(1) (2) 

s A (a(x),a(x')) ^ e x (x,x') ^ e A {a{x), a(x )) + 2R(X). 

It follows immediately from the definition of W that for all y,y' e X, W(a(y),a(y')) ^ d x (y,y'). From 
the definition of dA it follows that W(a, a') ^ ^(a, a'). Then in order to prove (1) pick xq, ■ ■ ■ , x m in X with 



Since the Z a are disjoint, dA is a true metric on A. 
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X A = {a 1<a2 ,a 3 } 




Figure 5. Explanation of Theorem 15.11 Top: A space Z composed of 3 disjoint path 
connected parts, Z\, Z2 and Z3. The black dots are the points in the finite sample X. In 
the figure, wy = DziZi, Zj), I $c i j ^ 3. Bottom Left: The dendrogram representation 
of (X,e dx ). Bottom Right The dendro gram representation of the persistent set (A, 6 dA ). 
Note that oa (01,02) = W13 +w 2 3, 0^(01,03) = Wia and 0^1(02,03) = ^23- As R(X) -> 0, 
(X, — » (A, # rfA ) in the Gromov-Hausdorff sense, see text for details. 



xq = x, x m = x 1 and dx(xi, <£»+i) s ? ffx(^, s'). Consider the points in ^4 given by a(x) = a(afo), ■ • ■ 1 ct(x m ) = 
a(x'). Then, d,A(a(xi), a(xi + ±)) ^ W(a(xi), a(xi + i)) ^ dx{xi, a:<+i) sS £x(x, x') for i = 0, . . . , m — 1 by the 
claim above. Hence (1) follows. 

We now prove (2). Assume first that a{x) = a(x') = a. Fix e > small. Let 7 : [0, 1] -» Z Q be a 
continuous path s.t. 7(0) = x and 7(1) = x'. Let zi,...,z m be points on imaged) s.t. zq = x, z m = x' 
and dx{zi, Zi+i) < to, i = 0, . . . , m — I. By hypothesis, one can find x = Xq, x\, . . . , x m _i, x rn = x' s.t. 
dz(xi,z.i) ;i R(X). Hence dx(xi, Xi + \) < £0 + 2i?(X) and hence ^(x, x') ^ £o + 2i?(X). Let eo — * to 
obtain the desired result. 

Now if a = a(x) # a(x') = f3, let ao,ai, . . . , ai e A be s.t. ao = a(x), ai = a(x') and dA(oij, ay+i) ^ 
£a(o!(x), a(x')) for j = 0, . . . , I — 1. 

By definition of Oa, f° r each j = 0, . . . , I — 1 one can find a path Cj = {aj , . . . , ctj } s.t. aj -' = ay, 

a { p ] = a j+ i and ^[io 1 W(af ,a^ +1) ) = d x ( aj , ay+i) s: e A (a, /?). It follows that W{af , ^ £j4 (a,/3) 
for i = 0, . . . , rj — 1. Consider the path C = {So, . . . , 3 S } in .A joining a to given by the concatenation of 
all the Cj. By eliminating repeated consecutive elements in C if necessary, one can assume that 3, # <3i+i- 
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By construction W(ai, Sj + i) ^ e^(a, /3) and So = a, a s = [3. We will now lift C into a path in Z joining x 
to a;'. 

Note that by compactness, for all v, fj, e A, v j= fi there exist ^ e Z v and e Z^ s.t. W{y,n) = 
dz{Zy ^ zit u)- Consider the path Q in Z given by 

t/ — tx, z &g &i , z So Si , . . . , z s ls l: s lB i x }■ 

For each point g e G pick a point x(g) e X s.t. dz(g,x(gj) ^ i?(A). This is possible by definition of R(X). 
Let G' = {xo,xi, . . . ,xt\ be the resulting path in X. Notice that if a(xt) # a(xt + i) then dx(xt, xt+i) ^ 
2i?(X)+Vt / (a(a; t ), a(X t+ i)) by the triangle inequality. Also, by construction^*) W(a(xt), a(xt+i)) ^ 

6A(a,P). 

Now, we claim that 

ex(x, x 1 ) ^ max VF(a(xt), a(x t+ i)) + 2R(X). 

This claim will follow from the simple observation that ex(x,x') $C max t ex{x t , x t +\). If a(x t ) = a(x t +\) 
we already proved that Ex(xt, x t+ i) ^ 2R(X). If on the other hand a(x t ) ^ a(x t +i) then, exf^t^t+i) 
2i?(X) + W(a(x t ), a(x t+ i)) and hence the claim. Combine this fact with (*) to conclude the proof of (2). 
Putting (1) and (2) together we have 

dg n ((X,e x ),{A,e A )) ^2R(X) 
and the conclusion follows by letting R(X) — > 0. 

□ 

6. FUNCTORIALITY AND BOOTSTRAP CLUSTERING 

In the previous section, we have observed that by encoding the output of a clustering scheme as diagram 
(i.e. as a persistent set or dendrogram) allows one to assess stability of the clustering obtained from the 
scheme. In this section, we will demonstrate that another use of functoriality can be used to assess stability 
of clustering schemes whose output is simply a partition of the underlying point cloud. We begin by recalling 
the basics of the bootstrap method developed by B. Efron [Efr79 . The bootstrap considers a set of point cloud 
data X, and repeatedly samples (with replacement) collections of (say) n elements from X. For each sample, 
one measures of central tendency such as means, medians, variances, are computed, and the distribution of 
these measures as a statistic are studied. It is understood that such computations are more informative than 
the measures computed a single time on the full set X. We wish to perform a similar analysis for clustering. 
The difficulty is that the output of clustering is not a single numerical statistic, but is rather a structural, 
qualitative output. We will now show how functoriality can be used to assess compatibilty of clusterings of 
subsamples, and thereby obtain a method for confirming that clustering is a significant feature of the data 
rather than an artifact. 

In the context of clustering these bootstrapping ideas arise when dealing with massive datasets: one is 
forced to analysing several smaller, more manageable random subsamples of the original data to produce 
partial pictures of the underlying clustering structure. The problem then is how to agglomerate all this 
information together. 

In this section, for us, a clustering scheme will denote any rule C which assigns to every finite metric 
space S a partition VciS). We write Bc(S) for the set of blocks of the partition Vc(S). If we are given 
two finite metric spaces S, T, an embedding from S to T, is an injective set map i : S > T, so that 
dT(i-(x), t(x')) = ds{x, x'). Given any partition V of a metric space T, and given any set map ip : S — > T, we 
write Lp*(V) for the partition of S which places s, s' 6 S in the same block if and only if ip(s) and tp(s') lie in 
the same block of V. The clustering scheme C is now said to be I-functorial if Vc{S) refines t,*(Vc(T)) 
for any embedding l : S — > T. Note that for any I-functorial clustering scheme C, there is an induced map 
Bc(t) '■ Bc{S) —* Bc(T) for any embedding i : S •— > T. An example of an I-functorial clustering scheme is 
single linkage clustering for a fixed threshhold e. 

Now let X be a set of point cloud data, equipped with a metric d. We build collections of samples S,cl 
of size n from X, with replacement, for 1 ^ i ^ N. We assume we are given an I-functorial clustering scheme 
C. We note that each of the samples Si and the sets Si u Si+i are finite metric spaces in their own right, 
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that the natural inclusions Si Si u Si+i and Si+i <—> Si u are embeddings of finite metric spaces. It 
follows from the I-functoriality of the clustering scheme C that we obtain a diagram of sets in Figure [6] 



B(Si_l U S 4 ) B(SiUS i + 1 ) 




B{Si_ x ) B(Si) 8(S i+1 ) 



Figure 6. Diagram of sets obtained via I-functoriality of the clustering scheme. 

We will refer to such a diagram as a zig zag. In an intuitive sense, this diagram now carries information 
about the stability or the significance of the clustering. The informal idea is that sequences of the form 
{ x v}t=s (formed by consecutive elements), with x v e Bc(S„), and with Bc(i^)(x v ) = Bc{i~^ + i)(x v+ \) should 
describe small scale pictures of a clustering of X, where i+ : S v S v u S v +\ and : S„+i S v u S v +i 
are the inclusions. Informally, the idea is that "compatible families" of clusterings of the samples S v should 
correspond to clusterings of the entire set X. Of course, the length of the sequence (t — s + 1) must be 
significant. A single pair of compatible clusters will not be as significant as a long sequence. The problem 
with this idea as stated is that it is very hard to make precise the definitions of the sequences, and to describe 
them. 

Unlike the case of ordinary persistent sets, where dendrograms provide a straightforward visualization of 
all such structure, we believe that in the case of zig-zags of sets no such simple representation is possible. 
However, there turns out to be (see below) a readily computable analogue of the persistence barcode, [GhrOS, 
ZC04 . We now see how this works. 

We note first that this situation has certain things in common with dendrograms. Rooted trees can be 
viewed as diagrams of sets of the form 

■V- f0 V fl y v 

Aq — > Ai — > • ■ • A. n ^i — > A B — » • ■ ■ 

for which there is an integer N so that Xk consists of one element for all k ^ N. The smallest such N 
will be called the depth of the tree, d. One constructs a tree from such a diagram by forming the disjoint 
union £J i= n Xi x [0, 1], and then forms the quotient by the equivalence relation generated by the equivalences 
x x 1 ~ f s (x) x for all x e X s and s < d. The set Xi will now correspond to the nodes of depth d — I in 
the tree. The tree representation turns out to be a useful representation of structure of the sets of clusters 
as a set varying with a threshhold parameter. Given instead a zig zag diagram as above, it is again possible 
to construct a graph which represents the data, but it is harder to make useful sense of it, since it is a 
fairly general graph. Nonetheless, it turns out that it is possible to obtain a useful partial description using 
algebraic techniques. 

One begins with a field k (typically F 2 , the field with two elements), and constructs for each of the sets 
B(Si) and B(Si u in the zig zag the corresponding vector spaces k[B(Si)] and k[B(Si u S'i+i)], i.e. 

vector spaces with the given sets as bases. The zig zag diagram now gives rise to a diagram of vector spaces 
and linear transformations of the same shape. It turns out that there is an algebraic classification of such 
diagrams up to isomorphism. To describe this classification, we will describe every zig zag diagram as a 
family of vector spaces {Vi}i, equipped with linear transformations A; : Vn — > V21+1 and fa : Vn — > Vi%-\- 
Given integers a ^ b, we denote by Z[a,b] the zig zag diagram for which V% = k for all a ^ i ^ b, and 
Vi = {0} for i $ [a, b], and for which every possible non-zero linear transformation is equal to the identity. 
For example, Z[3,6] is the diagram 

• • • {0} -» V 3 " Vi " V 5 " V 6 -> {0} • • • 

where V3,V4, Vs,V6 = k. Note that these diagrams are parametrized by closed intervals with integer end- 
points. 

We now have the following theorem of Gabriel (see |GR97j ). 

Theorem 6.1. Every zig zag diagram is isomorphic to a direct sum of diagrams of the form [a,,6i], and the 
decomposition is unique up to reordering of the summands. 
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7. Discussion 



We have presented the ideas of functoriality and persisence as useful organizing principles for clustering 
algorithms. We have made particular choices of category structures on the collection of finite metric spaces, 
as well as for the notion of multiscale/resolution sets. One can imagine different notions of morphisms of 
metric spaces and of persistent sets. For example, the idea of multidimensional persistence (see |CZ07j ) could 
provide methods which in addition to the parameter r could track density as estimated by some estimator, 
giving a more informative picture of the dataset. It also appears likely that from the point of view described 
here, it will in many cases be possible, given a collection of constraints on a clustering functor, to determine 
the universal one satisfying the constraints. One could therefore use sets of constraints as the definition of 
clustering functors. 

We believe that the conceptual framework presented here can be a useful tool in reasoning about clustering 
algorithms. We have also shown that clustering methods which have some degree of functoriality admit the 
possibility of certain kind of qualitative geometric analysis of datasets which can be quite valuable. The 
general idea that the morphisms between mathematical objects (together with the notion of functoriality) 
are critical in many situations is well-established in many areas of mathematics, and we would argue that it 
is valuable in this statistical situation as well. 

We have also discussed how to obtain quantitative stability, consistency and convergence results using a 
metric space representation of the output of clustering algorithms. We believe these tools can also contribute 
to the understanding of theoretical questions about clustering as well. 

Finally we would like to comment on the fact that functoriality ideas and metric based study complement 
eachother. In the sense that using functoriality, first, one can reason about global stability or rigidity of 
methods in order to identify a class of them that is sensible, and then, by applying metric tools one can 
understand the behaviour/convergence as, say, the number of samples goes to infinity, or to the quantify 
error in approximating the underlying reality when only finitely many samples are used. 
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